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ABSTRACT 

Sequence-derived structural and physicochemical 
features have been extensively used for analyzing 
and predicting structural, functional, expression 
and interaction profiles of proteins and peptides. 
PROFEAT has been developed as a web server for 
computing commonly used features of proteins and 
peptides from amino acid sequence. To facilitate 
more extensive studies of protein and peptides, 
numerous improvements and updates have been 
made to PROFEAT. We added new functions for 
computing descriptors of protein-protein and 
protein-small molecule interactions, segment de- 
scriptors for local properties of protein sequences, 
topological descriptors for peptide sequences and 
small molecule structures. We also added new 
feature groups for proteins and peptides (pseudo- 
amino acid composition, amphiphilic pseudo-amino 
acid composition, total amino acid properties and 
atomic-level topological descriptors) as well as 
for small molecules (atomic-level topological de- 
scriptors). Overall, PROFEAT computes 11 feature 
groups of descriptors for proteins and peptides, 
and a feature group of more than 400 descriptors 
for small molecules plus the derived features for 
protein-protein and protein-small molecule inter- 
actions. Our computational algorithms have been 
extensively tested and used in a number of pub- 
lished works for predicting proteins of specific 
structural or functional classes, protein-protein 



interactions, peptides of specific functions and 
quantitative structure activity relationships of small 
molecules. PROFEAT is accessible free of charge at 
http://bidd.cz3.nus.edu.sg/cgi-bin/prof/protein/ 
profnew.cgi. 

INTRODUCTION 

Sequence-derived structural and physicochemical features 
are highly useful for representing and distinguishing 
proteins or peptides of different structural, functional 
and interaction properties, and have been widely used in 
developing methods and software for predicting protein 
structural and functional classes (1—7), protein-protein 
interactions (8-10), protein-ligand interactions (11,12), 
protein substrates (13,14), molecular binding sites on 
proteins (15-20), subcellular locations (21), protein crys- 
tallization propensity (22-24) and peptides of specific 
properties (25-30). Web servers, such as PROFEAT 
(31) and PseAAC (http://www.csbio.sjtu.edu.cn/bioinf/ 
PseAA/) (32), have been built to facilitate the computation 
of protein and peptide features. 

Nonetheless, some features important for studying 
proteins, peptides and molecular interactions have not 
been provided in these web servers. Examples of these 
features include atomic-level topological descriptors that 
are useful for structure-property correlations (33) and de- 
scriptors of total amino acid properties (TAAPs) that have 
been used for modeling protein conformational stability 
(34), ligand binding site structural features (35) and inter- 
action with small molecules (36). Moreover, the descrip- 
tors provided in those available web servers are not 
suitable for analyzing local properties of sequence 
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subsections, and additional works are needed to use de- 
scriptors to study protein-protein and protein-ligand 
interactions. Therefore, it is desirable to provide segment 
descriptors for local properties of subsections of protein 
sequences, and descriptors that can be straightforwardly 
used for exploring protein-protein and protein-small 
molecule interactions. 

We updated PROFEAT by adding new functions for 
computing descriptors of protein-protein and protein- 
small molecule interactions, segment descriptors for 
local properties of subsections of protein sequences, 
atomic-level topological descriptors for peptide sequences 
and small molecule structures, and topological polar 
surface areas of small molecules. Moreover, we added 
new feature groups such as pseudo-amino acid compos- 
ition (PAAC), amphiphilic PAAC (APAAC), TAAPs, and 
atomic-level topological descriptors. The computational 
algorithms of these newly added feature groups have 
been extensively tested and used in a number of published 
works for predicting proteins and peptides of specific 
properties, protein-protein interactions, and quantitative 
structure activity relationships of small molecules. A list of 
publications using features covered by PROFEAT is 
provided in Supplementary Table SI and in PROFEAT 
online server which can be accessed at http://bidd.cz3.nus. 
edu.sg/prof/part_of_publications.htm. PROFEAT 
homepage is shown in Figure 1. A list of features for 
proteins and peptides covered by this version of 
PROFEAT is summarized in Table 1 and a list of the 
topological descriptors for peptides and small molecules 
computed by PROFEAT is summarized in Supplementary 
Table S2. 



METHODS FOR NEWLY ADDED FEATURES 
AND FUNCTIONS 

PAAC descriptors 

First, three variables are derived from the original 
hydrophobicity values H^ii), hydrophilicity values H%(i) 
and side chain masses M°(i) of 20 amino acids (i = 1, 
2,..., 20) (32): 
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Then, a correlation function can be computed as: 
®(R h Rj) = l -[[H i (R i ) - H^R,)] 2 
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from which, sequence order-correlated factors are 
defined as: 
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X(< N) is a parameter. Let/- be the normalized occurrence 
frequency of 20 amino acids in the protein sequence, a set 
of 20+X descriptors called the PAAC are defined as: 
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where w is the weighting factor for the sequence-order 
effect and is set to be w = 0.05 as suggested by Shen (32). 

APAAC 

From H\{i) and H 2 (i) defined in Equation (1) and (2), the 
hydrophobicity and hydrophilicity correlation functions 
are defined (32), respectively, as: 

H)j = Hi(f)Hi(j), H\j = H 2 (i)H 2 (j) 

from which sequence order factors can be defined as: 
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PROFEAT- Protein Feature Server (2011) 

PROFEAT is developed as a web server for computing commonly used features of proteins 
and peptides from amino acid sequence and of small molecules from molecular structure. 

You can choose to calculate feature vector for: 

(A) Protein (B) Protein-Protein Interaction Pair 

(C) Small Molecule (D) Protein-Liqand Interaction Pair 

For introduction to PROFEAT, please see the Reference Manual 

In this page, PROFEAT is designed for computing physicochemical properties of proteins and 
peptides from their primary sequences. 



Sequence 



Sequence MUST be provided in RAW or FASTA format 

Upload Sequences Browse... 

Batch Query: maximum 1000 sequences in FASTA format 

Submit Reset 



New features 2011 version: 

(1) Descriptors of protein-protein interactions; 

(2) Descriptors of protein-small molecule interactions; 

(3) Segment descriptors for local properties of subsections of protein sequences; 

(4) Atomic-level topological descriptors for peptide sequences; 

(5) Atomic-level topological descriptors for small molecule structures; 

(6) Topological polar surface areas of small molecules; 

(7) Pseudo amino acid descriptor for protein sequences. 



If you find any error or bug in this web servise, please kindly reprot to to Dr. Zhu . 

I visits since November 6, 2005 



46574 



Figure 1. PROFEAT new web page. 



W388 Nucleic Acids Research, 2011, Vol. 39, Web Server issue 



Table 1. List of PROFEAT computed features for proteins, peptides and protein-protein interactions 
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"The number depends on the choice of the number of properties of amino acid and the choice of the maximum values of the lag. 
b The number depends on the choice of the number of the set of amino acid properties and the choice of the k value. 
c The number depends on the choice of the A value. 

d The numbers depend on the choice of the number of properties of amino acid. 
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where w is the weighting factor and is taken as w = 0.5. 

Topological descriptors at atomic level 

Topological descriptors are based on graph theory 
and encode information about the types of atoms and 
bonds in a molecule and the nature of their connections. 
Examples of topological descriptors include counts of 
atom and bond types and indexes that encode the size, 
shape and types of branching in a molecule (37). These 
descriptors can be calculated from the 2D structure of a 
peptide automatically generated from its sequence based 
on the molecular structures of the amino acid residues in 
the sequence. Supplementary Table S2 gives a list of the 
topological descriptors computed by PROFEAT. 

TAAP 

TAAP descriptor for a specific physicochemical property i 
is defined as: P tot(i) = J2?=i Pmrmf/N, where p norm j repre- 
sents the property i of amino acid Rj that is normalized 
between 0 and 1 using the following expression, 

/»™™; = (pj-/'U)/CPUx-^n). where pj is the ori g inal 

amino acid property i for residue j. P' mSLX and P' min are, re- 
spectively, the minimum and maximum values of the 
original amino acid property i, and N is the length of 
the sequence (38^10). 

Protein-protein interaction descriptors 

Protein-protein interaction descriptors can be computed 
from the descriptors V a = {V a (0, i = 1, 2, ...,«} and 
V* = {Vb(i), z = 1, 2, ...,«} of individual proteins A and 



B by three methods. In the first method, two protein-pair 
vectors \ af} and \ hu with dimension of In are constructed 
with V„/, = (V„, V/,) for interaction between proteins A 
and B and V^ 0 = (Vj, V a ) for interaction between 
proteins B and A (8,9). In the second method, one 
vector V with dimension of 2n is constructed: 

V = {V„(/) + V 6 (0, V„(0 x V 4 (0, i = 1, 2, . . . , n) which 
has the property that V is unchanged when a and b are 
exchanged. In the third method, one vector V with dimen- 
sion of n 2 is constructed by the tensor product: 

V = {V(fc) = V a (/) x VbG), i = 1, 2, . . . ,n, j = 1, 2, . . . , n, 
k = {i — 1) x n + f\. 

Protein-ligand interaction descriptors 

Protein-ligand interaction descriptor vector V can be 
constructed from the protein descriptor vector V p (V p (i), 
i=l,...,n p ) and ligand descriptor vector V, (Vj(/), 
i= 1 , . . . , «i) by two methods similar to the first and 
third method for constructing protein pair descriptors. 
In the first method, one vector V with dimension of 
n p + ti\ are constructed V = (V p ,V\) for interaction 
between protein and ligand. In the second method, one 
vector V with dimension of n p x n\ is constructed by the 
tensor product: V = {v(k) = V p (i) x V\(f), i = 1, 2, . . . ,n p , 
j =1,2,.. . ,«!, k = (i - 1) x n p +j}. 

Segmented sequence descriptors 

To characterize the local feature of a protein sequence, a 
protein sequence can be divided into several segments and 
descriptors are calculated for each segment. 

Topological descriptors for small molecules 

For small molecules, topological descriptors are calculated 
from the input 2D structures of small molecules in mol or 
sdf format. Names of these descriptors are the 
same as those for protein segments which are listed in 
Supplementary Table S2. 
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REMARKS 

Compared with its earlier version, the updated PROFEAT 
is significantly enhanced in both the number of newly 
added features useful for representing various protein 
properties, and newly added functions for computing 
features for local properties of protein segments, 
protein-protein interactions, protein-small molecule 
interactions and small molecules. These enhancements 
are intended to provide more comprehensive features for 
facilitating the analysis and prediction of proteins, 
peptides, small molecules of different properties and mo- 
lecular interactions involving proteins, peptides and small 
molecules. With continued interest in using molecular and 
interaction features and developing new algorithms for 
representing these features, new descriptors and functions 
such as those involving DNA, RNA and other nucleotides 
can be integrated into PROFEAT in the near future to 
better facilitate the study of molecular and bio-molecular 
functions and interactions. 

SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online. 
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